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Abstract 

In this paper we present improved bounds for approximating maximum matchings in bipartite graphs in 
the streaming model. First, we consider the question of how well maximum matching can be approximated 
in a single pass over the input when 0(n) space is allowed, where n is the number of vertices in the 
input graph. Two natural variants of this problem have been considered in the literature: (1) the edge 
arrival setting, where edges arrive in the stream and (2) the vertex arrival setting, where vertices on one 
side of the graph arrive in the stream together with all their incident edges. The latter setting has also 
been studied extensively in the context of online algorithms, where each arriving vertex has to either be 
matched irrevocably or discarded upon arrival. In the online setting, the celebrated algorithm of Karp- 
Vazirani-Vazirani achieves a 1 — 1/e approximation by crucially using randomization (and using 0(n) 
space). Despite the fact that the streaming model is less restrictive in that the algorithm is not constrained 
to match vertices irrevocably upon arrival, the best known approximation in the streaming model with 
vertex arrivals and 0(n) space is the same factor of 1 — 1 /e. 

We show that no (possibly randomized) single pass streaming algorithm constrained to use 0(n) space 
can achieve a better than 1 — 1/e approximation to maximum matching, even in the vertex arrival setting. 
This leads to the striking conclusion that no single pass streaming algorithm can get any advantage over 
online algorithms unless it uses significantly more than 0(n) space. Additionally, our bound yields the 
best known impossibility result for approximating matchings in the edge arrival model (improving upon 
the bound of 2/3 proved by Goel at al[SODA 12]). 

Second, we consider the problem of approximating matchings in multiple passes in the vertex arrival 
setting. We show that a simple fractional load balancing approach achieves approximation ratio 1 — 
e~ k k k ~ 1 /(k — 1)! = 1 — } k + o(l/fc) in k passes using linear space. Thus, our algorithm achieves 

the best possible 1 — 1/e approximation in a single pass and improves upon the 1 — 0{y/\og log fc/fc) 
approximation in k passes due to Ahn and GuhaflCALP' 11]. Additionally, our approach yields an efficient 
solution to the Gap-Existence problem considered by Charles et al[EC' 10]. 
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1 Introduction 



The need to process modern massive data sets necessitates rethinking classical solutions to many combinato- 
rial optimization problems from the point of view of space usage and type of access to the data that algorithms 
assume. Applications in domains such as processing web-scale graphs, network monitoring or data mining 
among many others prohibit solutions that load the whole input into memory and assume random access to 
it. The streaming model of computation has emerged as a more realistic model for processing modern data 
sets. In this model the input is given to the algorithm as a stream, possibly with multiple passes allowed. The 
goal is to design algorithms that require small space and ideally one or a small constant number of passes over 
the data stream to compute a (often approximate) solution. For many problems with applications in network 
monitoring, it has been shown that space polylogarithmic in the size of the input is often sufficient to compute 
very good approximate solutions. On the other hand, even basic graph algorithms have been shown to require 
fl(n) space in the streaming model(5l, where n is the number of vertices. A common relaxation is to allow 
0{n ■ polylog(n)) space, a setting often referred to as the semi-streaming model. 

1.1 Matchings in the streaming model 

The problem of approximating maximum matchings in bipartite graphs has received significant attention 
recently, and very efficient small-space solutions are known when multiple passes are allowedJU [14] |U [T] [2] 
[TTl . The best known algorithm due to Ahn and Guha [lj achieves a 1 — 0(y / log log k/k) in k passes for the 
weighted as well as the unweighted version of the problem using O(kn) space. 

Single pass algorithms. All algorithms mentioned above require at least two passes to achieve a nontrivial 
approximation. The problem of approximating matchings in a single pass has recently received significant 
attention (U [TTl. Two natural variants of this problem have been considered in the literature: (1) the edge 
arrival setting, where edges arrive in the stream and (2) the vertex arrival setting, when vertices on one side 
of the graph arrive in the stream together with all their incident edges. The latter setting has also been studied 
extensively in the context of online algorithms, where each arriving vertex has to either be matched irrevocably 
or discarded upon arrival. 

In a single pass, the best known approximation in the edge arrival setting is still 1/2, achieved by simply 
keeping a maximal matching (this was recently improved to 1/2 + e for a constant e > under the additional 
assumption of random edge arrivals [11]). It was shown in [8 | that no (D(n) space algorithm can achieve a 
better than 2/3 approximation in this setting. 

In the vertex arrival setting, the best known algorithms achieve an approximation of 1 — 1/e. The assump- 
tion of vertex arrivals allows one to leverage results from online algorithms iflOl [i"3l l9l . In the online model 
vertices on one side of the graph are known, and vertices on the other side arrive in an adversarial order. The 
algorithm has to either match a vertex irrevocably or discard upon arrival. The celebrated algorithm of Karp- 
Vazirani-Vazirani achieves a 1 — 1/e approximation for the online problem by crucially using randomization 
(additionally, this algorithm only uses 0(n) space). A deterministic single pass 0{n) space 1 — 1/e approx- 
imation in the vertex arrival setting was given in ||8] (such a deterministic solution is provably impossible in 
the online setting). In QQ, the authors also showed by analyzing a natural one-round communication problem 
that no single-pass streaming algorithm that uses 0(n) space can obtain a better than 3/4 approximation in 
the vertex arrival setting. They also provided a protocol for this communication problem that matches the 3/4 
approximation ratio, suggesting that new techniques would be needed to prove a stronger impossibility result. 

Lop-sided graphs. The techniques for matching problems outlined above yield efficient solutions that use 
0(|P| + \Q\) space, where |P| and \Q\ are the sizes of the sets in the bipartition. While this is a reasonable 
space bound to target, this can be prohibitively expensive for lop-sided graphs that arise, for example, in 
applications to ad allocations. Here the P side of the graph corresponds to the set of advertisers, and the Q 
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side to the set of impressions Q. An important constraint is that the set of impressions Q may be so large that 
it is not feasible to represent it explicitly, ruling out algorithms that take 0(|P| + \Q\) space. 

Data model for lop-sided graphs. Since the set Q cannot be represented explicitly, it is important to fix 
the model of access to Q. Here we assume the following scenario. Vertices in P arrive in the stream in an 
adversarial order, together with a representation of their edges. We make no assumptions on the way the 
edges are represented. For example, some edges could be stored explicitly, while others may be represented 
implicitly. We assume access to the following two functions: 

1. LIST-NEIGHBORS(ti, S) which, given a set of vertices S C Q and a vertex it G P, lists the neighbors 
of u in S; 

2. NEW-NEIGHBOR(u, S) which, given a set of vertices S C Q and a vertex u G P outputs a neighbor 
of u outside of the set S. 

1.2 Our results 

In this paper, we improve upon the best known bounds for both the single pass and multi-pass settings. In 
the single pass setting, we prove an optimal impossibility result for vertex arrivals, which also yields the best 
known impossibility result in the edge arrival model. For the multipass setting, we give a simple algorithm 
that improves upon the approximation obtained by Ahn and Guha in the vertex arrival setting, as well as yields 
an efficient solution to the Gap-Existence problem considered by Charles et alJU. 

Lower bounds. In this paper we build upon the communication complexity approach taken in ||8] to obtain 
lower bounds via what can be viewed as multi -party communication complexity. Our main result is an optimal 
bound on the best approximation ratio that a single-pass 0(n) space streaming algorithm can achieve in the 
vertex arrival setting: 

Theorem 1 No (possibly randomized) one-pass streaming algorithm that outputs a valid matching with prob- 
ability at least 3 /4 can obtain a better than 1 — 1/e + 8-approximation to the maximum matching, for any 
constant 5 > 0, unless it uses at least j 7 1 + n ( 1 / lo g lo g ri ) space, even in the vertex arrival model. 

We note that this bound is matched by the randomized KVV algorithm! 10] for the online problem and the 
deterministic 0(n) space algorithm of JSJ. One striking consequence of our bound is that no single-pass 
streaming algorithm can improve upon the more constrained online algorithm of KVV, which has to make 
irrevocable decisions, unless is uses significantly more than 0(n) space. Our bound also improves upon the 
best known bound of 2/3 for small space one-pass streaming algorithms in the edge arrival model. 

Comparison with |8| It was shown in (H via an analysis of the natural two-party communication problem 
that no one-pass streaming algorithm that uses 0(n) space can achieve approximation better than 2/3 in the 
edge arrival setting and 3/4 in the vertex arrival setting. Furthermore, the authors also gave a communication 
protocol that proves the optimality of both bounds for the communication problem, thus suggesting that a 
more intricate approach would be needed to prove better impossibility results. 

In this paper we prove the optimal bound of 1 — 1/e on the best approximation that a single-pass 0(n) 
space algorithm can achieve even in the vertex arrival setting. While the lower bounds from [8 1 follow from 
a construction of a distribution on inputs that consists of two parts and hence yields a two-party communi- 
cation problem, here we obtain an improvement by constructing hard input sequences that consist of k parts 
instead of two, getting a lower bound that approaches 1 — 1/e for large k. This can be viewed as multi- 
party communication complexity of bipartite matching, but we choose to present our lower bound in different 
terms for simplicity. We note that the approach of [8 1 to a multi-party setting requires a substantially different 
construction. We discuss the difficulties and our approach to overcoming them in section |2] 
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Upper bounds. We show that a simple algorithm based on fractional load balancing achieves the optimal 
1 — 1/e approximation in a single pass and 1 — ^—g + o(k~ 1 ' 2 ) approximation in k passes, improving upon 
the best known algorithms for this setting: 

Theorem 2 There exists an algorithm for approximating the maximum matching M in a bipartite graph 
G = (P, Q, E) with the P side arriving in the stream to factor 1 — e~ k k k ~ l / (k - 1) ! = 1 — + 0(k~ 3 / 2 ) 
in k passes using 0(\P\ + \Q\) space and 0(m) time per pass. 

Remark 3 Note that our algorithm extends trivially to the case when vertices in P have integral capacities 
B u , u £ P, corresponding to advertiser budgets. 

The gap-existence problem. In [3] the authors give an algorithm for the closely related gap-existence prob- 
lem. In this problem the algorithm is given a bipartite graph G = (A, I, E), where A is the set of advertisers 
with budgets B a , a G A and / is the set of impressions. The graph is lopsided in the sense that |/| 3> A 
matching M is complete if |Mn<5(z)| = 1 for all i E I and |Mn<5(a)| = B a for all a £ A The gap-existence 
problem consists of distinguishing between two cases: 

(YES) there exists a complete matching with budgets B a ; 

(NO) there does not exist a complete matching with budgets [(1 — t)B a \ . 

The approach of is via sampling the / side of the graph, and yields a solution that allows for non-trivial 
subsampling when the budgets are large. In particular, they obtain an algorithm with runtime O ' ^ ^gMi . — LO. 



^ min a \B 

which is sublinear in the size of the graph when all budgets are large. In section [5] we improve significantly 
upon their result, showing 

Theorem 4 Gap-Existence can be solved in 0(log(|/| • Yla£B a ^a)/^ 2 ) passes using space 0(^ agj4 B a /e). 
The time taken for each pass is linear in the representation of the graph. 

It should also be noted that the result of (3l could be viewed as a single pass algorithm, albeit with the stronger 
assumption that the arrival order in the stream is random. 

Organization: In section [2] we present the framework of our lower bound, which relies on a special family 
of graphs that we refer to as (d, k, 8) -packing. We then give a construction of a (d, k, 5) -packing in section [3] 
Our basic multipass algorithm for approximating matchings is presented in section |4j and the algorithm for 
Gap-existence is given in section [5] 



2 Single pass lower bound 

In this section we define the notion of a (d, k, 5)-packing, our main tool in proving the lower bound. A 
(d, k, 5) -packing is a family of graphs parameterized by the set of root to leaf paths in a d-ary tree of height 
k, inspired by Ruzsa-Szemeredi graphs, i.e. graphs whose edge set can be partitioned into large induced 
matchings. In this section we will show that existence of a (d,k, 5) -packing with a large number of edges 
implies lower bounds on the space complexity of achieving a better than 1 — 1/e approximation to maximum 
matchings in a single pass over the stream. 

We first recall the definition of induced matchings and e-Ruzsa-Szemeredi graphs. 

Definition 5 Let G = (P, Q, E) denote a bipartite graph. A matching F C E that matches a set A^P to a 
subset B CQ is induced if E n (A x B) = F. 



Definition 6 A bipartite graph G = (P, Q, E) with \P\ = \Q\ = nis an e-Ruzsa-Szemeredi graph if one can 
write E = (Ji=i ^h, where each Mi is an induced matching and \Mj\ = en for all i. 

Several constructions of Ruzsa-Szemeredi graphs with a large number of edges are known. We will use the 
techniques pioneered in @, where the authors construct e-Ruzsa-Szemeredi graphs with constant e < 1/3, 
and the extensions developed in [8], where it is proved that 

Theorem 7 jj^jl For any constant 5 E (0, 1 /2) there exist bipartite (1 /2 — 5)-Ruzsa-Szemeredi graphs on 2n 
nodes with n i+n(i/iogiogn) edges 

In the rest of the section we define a distribution on input instances for our problem of approximating max- 
imum matchings in a single pass in the streaming model. We start by providing intuition for our distribution. 
It is useful to first recall how the best known lower bound of 3/4 for the same setting is proved in [8 |. The 
stream in flU consists of two 'phases' . In the first phase, the algorithm is presented with a graph G = (P,Q,E) 
such that \P\ = n, \Q\ = 2n and the edge set E can be represented as a union of induced 2-matchings Mi, 
i = 1, . . . , k, k = n^i/iogiog"), where Mj matches a subset ^CP such that 1^1 > (1/2 - 5)n to a subset 
E>i C Q, \Bi\ = (1 ± 5)n. Then an index i is chosen uniformly at random from [1 : k], and in the second 
part of the stream a matching arrives that matches a new set of vertices P* to Q* = Q \ E>i, making the 
edges of the (uniformly random) matching Mj crucial for constructing a better than 3/4 approximation to the 
maximum matching in the whole instance. It is then shown, using an additional randomization trick, that the 
algorithm essentially needs to store bits for each edge in each induced matching Mi if it beats the 3/4 
approximation ratio. 

We generalize this approach by constructing hard distributions on inputs that consist of multiple phases, 
for which any algorithm that achieves a better than 1 — 1/e approximation is essentially forced to remember 
Q(l) bits per edge of the input graph. Ensuring that this is the case is the main challenge in generalizing the 
construction in flU to a multiphase setting. We address this challenge using the notion of a (d, k, (5)-packing, 
which we now define. 

2.1 (d, k, 5)-packing 

Let T denote a d-wy tree of height k. A (d, k, J)-packing will be defined as a function mapping root-to-leaf 
paths p in T to bipartite graphs on the vertex set (T, S), where T and S are the two sides of the bipartition. 
We will write G(p) to denote the graph that a path p is mapped to by the packing. 

The vertex set of G(p) for each root-to-leaf path p will always be (T, S), so that the choice of p determines 
the set of edges of the graph. We partition the set S as S = So U . . . U Sk-i U Sk (the sets Si, i = 0, . . . , k are 
disjoint and correspond to k + 1 'phases' of the input instance). We will always have \T\ = (1 + 0(<5))|5| 
for an arbitrarily small constant 5 > 0. 
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<i-ary tree T 




Figure 1: A root to leaf path in T ■ Thick solid edges represent the edges of the path (r = u , u 1; u 2 ). Dashed 
edges incident on nodes on the path V correspond to subgraphs Hf for i = 0, 1 and w a child of ui. 




<~>0 '-'o ^1 °1 '-'2 



Figure 2: Subgraphs {T Ui ,Si) that arrive in the stream. The edges of induced near-regular subgraph Hq 1 
induced by (T"° \ T Ul ) U S^ 1 are shown in bold. 

We now associate several sets of vertices on the T and S side of the bipartition with each node in the 
binary tree T ■ Let u G T be a node a? distance i € [0, fc] from the root. The following sets are associated 
with u: 

1. a subset T" C T, such that if w is a child of u in 7~, one always has T w C T M ; 

2. for each j G [0 : i — 1], a set such that if u> is a child of u in T, one always has SJ C S 1 ". To 
simplify notation, we set S^ 1 := Si. 

We now describe the hard inputs that we will use. The input sequence is split into k + 1 phases. The 
z-th phase corresponding to the i-th vertex on the path p from root to a leaf, where i = 0, . . . , k (see Fig. [j}- 
During phase i the edges of the subgraph induced by Gi(p) = (T Ui , Si,Ei(p)) arrive in the stream. Crucially, 
the graph Gi(p) will be a union of induced sparse subgraphs indexed by children of Ui. 

This setup is illustrated in Fig. [T] where (a) all edges of the path p = (r = uq, u\, 112) are shown in bold 
and (b) all edges of T that are incident on nodes of p are dashed since the corresponding subgraphs Hf arrive 
in the stream. The path p yields a nested sequence T = T uo D T Ul D . . . T Uk shown in Fig. [2] 

The reason behind the fact that this construction presents a hard instance for small space algorithms is as 
follows. At each step i the algorithm is presented with all the subgraphs Hf , of which all except the uniformly 
random one (corresponding to the next node on the path p, i.e. Hf 1+1 ) will be useful for constructing a large 
matching in the whole instance. Large here means a matching of size at least a (1 — (1 — l/k) k + 5') fraction of 
the maximum for some constant 5' > 0. To show that only these special subgraphs are useful for constructing 
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a large matching, we will later exhibit a directed cut of appropriate size in the graph G(p) that consists only 
of the edges of H™ 1+1 , i = 0, . . . , d — 1 (see Lemma [T3T>. The key to exhibiting such a cut is the special 
structure of the sets S™ k for % = 0, . . . , k — 1 that we define in property (2) of (d, k, 5) -packings below. An 
additional randomization trick will allow us to show that a construction of a (d,k, 5) -packing immediately 
yields a lower bound of essentially Q,(dn) on the space required for a single-pass algorithm to achieve an 
approximation ratio better than 1 — (1 — l/k) k + 5' for a constant 8' > 0. 

We now transform the intuitive description above into a formal argument. We will use the following 

Definition 8 We call a bipartite graph G = (P, Q, E) (a, b, <5)-almost regular if (1) at most a 5 fraction of 
vertices in P has degree outside of [(1 — 8)a, (1 + 5)a], and no vertex has degree larger than (1 + 8)a and 
(2) at most a 8 fraction of vertices in Q has degree outside of [(1 — 8)b, (1 + 5)b], and no vertex has degree 
larger than (1 + 8)b. 

Definition 9 ((d, k, J)-packing) A mapping from the set of root-to-leaf paths p in a d-ary tree T to the set of 

bipartite graphs G{p) = (T, S, E(p)) is a (d,k, 5)-packing if the following conditions are satisfied. 

Let p = (r = no, u\, . . . , Uk) be a root-to-leaf path in T. Let G(p) = (T, S, E(p)) denote the graph that 
the path p is mapped to. Then the nested sequences of sets T = T u ° D T Ul D . . . D T^- 1 D T Uk , and 
Si = Sf 4, D S^ l+1 D . . . D S^ k satisfies the following properties for all i = 0, . . . , k — 1: 

1. For a constant 7 > 0, one has for every child w of Ui in the tree T that the subgraph Hf induced by 

(T Ul \ T w ) U Sf is ({k - 1)7, kj, 5)-almost regular. 

2. there exists a set Z u * C T such that \Z U *\ < 0{5 /k 2 )\T u *\, and the subgraph induced by (T u% \ (T Uk U 

Z" 1 )) U S? h contains only the edges ofH^ +1 . 

3. there exists a matching of at least a 1 — 5 fraction of Si to T Ui \ T Ui+1 ; 

4. |T M *| = (1 + 0($))(1 - l/k)- k+l nand\S^\ = (1 + 0(S))(1 - l/k)~ k+ ^n/kfor all j = i, . . . , k - 1. 

5. there exists a matching of at least a 1 — 5 fraction of S^ to T Uk . 

Furthermore, for each i = 0, . . . , k the edge set of the subgraph induced by T Ui U Si only depends on the 
nodes ofp at distance at most ifrom the root. 

Remark 10 One could replace property (I) with the requirement that Hf be a matching of a 1 — 0{6) 
fraction of Sf to T Ui \ T w , and still get a lower bound that tends to 1 — 1/efor large k, albeit with slightly 
worse convergence. We prefer to use the more complicated definition to obtain the clean approximation ratio 
1 - (1 - l/k) k + 0{8), where 8 can be chosen an arbitrarily small constant, for any k > 1. 

In what follows we will often refer to properties of (d, k, 5)-packings by number, without specifying each 
time that Definition [9] is meant. 

In the rest of this section we will show that existence of large (d, k, 5) -packings implies space lower 
bounds for approximating matchings in one pass in the streaming model, thus proving 

Theorem 11 If a (d,k,S)-packing with @(n) vertices exists for sufficiently large constant k > and 5 = 
0(1 /k 3 ), then no one-pass streaming algorithm can obtain a better than (1 — (1 — 1/ k) k + 5') -approximation 
for any constant 5' > in space o(nd), even when vertices on one side of the graph arrive in the stream 
together with all their edges. 

Together with the construction of a (d, k, (5)-packing with d = n^( 1 / loglogn ) and 5 = 0(l/k 3 ) given in 
section |3l this will yield a proof of Theorem Q] 
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2.2 Distribution over inputs 

We now formally define the (random) input graph X = (P, Q, E) based on a (d, k, <5)-packing. We will always 
have P = |Ji=o &i anc ^ Q = T, but it will be useful to have notation for the parts P and Q of the bipartition 
of X. Let p = (r = uq, u\, . . . ,Uk) denote the path from the root of T to a uniformly random leaf. Let 
G = G(p) denote the graph that the path p is mapped to by our (d, k, <5)-packing. 

Let T = T uo D T Ul D T U2 D ... D T Uk denote the sequence of subsets of T corresponding to p. 
For each i = 0, . . . , k — 1 and each child w of m let Hf = (Xf ,Yf , Ef) denote the almost regular graph 
induced by Xf U Yf, where Xf = T Ui \ T w and Yf =Sf. 

We now introduce some randomness into the graph Hf. Let Hf be obtained from Hf via the following 
subsampling process. For each % and w let Kf denote a uniformly random subset of Xf of size 5|X|"| for a 
small constant 6. Let b% w = 1 if x S Kf and o.w. Then for each x £ X™ the graph Hf contains all edges 
incident on x in Hf if bx W = and none of the edges incident on x otherwise. For each % = 0, . . . , k — 1 let 
hi = (bT l+1 ) xeX i . Note that Hf is a ((A; - 1)7, £7, 0((5))-almost regular. For each i = 0, . . . , k - 1 let 

= (T Ui ,Si, Ei(p)) denote the subgraph with bipartition (T u \Si) such that Ei(p) is the union of the 
edges of all graphs Hf over all children w of Ui . Let Gk {p) = (T Uk , Sk, E^ (p) ) be a subgraph that consists of 
a perfect matching between and T" fc (see Fig. [2]). The instance X is the union of Gi(p) over i = 0, . . . , k. 

We now specify the order in which the vertices appear in the stream. The stream will consist of k + 1 
phases. For each i = 0, . . . , k the vertices and edges of G{(p) arrive in phase i in an arbitrary order. 

This completes the description of the input. We now turn to proving Theorem [TT] We will need the 
following claim 

Claim 12 G contains a matching of size at least (1 — 0(5)) (1 — l/k)~ k n. 

Proof: It is sufficient to match a 1 — 6 fraction of Si to T Ul \ T Ul+1 for all i = 0, . . . , k — 1, as guaranteed 
by property (3), and match the vertices in T Uk to S^. This matches a 1 — 0(5) fraction of T, and hence yields 
the required matching. ■ 

2.3 Bounding performance of a small space algorithm 

By Yao's minimax principle it is sufficient to upper bound the performance of a deterministic small space 
algorithm that succeeds with probability at least 1/2. To do that, we bound the size of the matching that a 
small space algorithm can output at the end of the stream. Let E* denote the set of edges that an algorithm 
outputs at the end of the stream. We first upper bound the approximation ratio that the algorithm obtains in 
terms of the number of edges in E(H^ t+1 ) n E* , where p = (no, u±, . . . , u^) is the uniformly random path 
from the root to a leaf in T ■ 

Lemma 13 The size of the matching output by the algorithm is bounded by 

k-i 

((1 - l/k)~ k - l) n + J2 \E(H^ +1 ) HE*\+ 0(5k 2 n). 

i=0 

Proof: Consider the cut (A, B), where A = (t° \ (T Uk U Uto U U^oO^ \ S t k ) and B = T Uk U 

S* U U-= $i k U Ui=o Z Ul - Here are the sets whose existence is guaranteed by property (2). 

By the maxflow/mincut theorem, the size of the matching output by the algorithm is bounded by | An P\ + 
\BnQ\ + \((AnQ) x (BnP))<lE*\. Furthermore, by property (2) in Definition |9]for the sets A and B one 
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has, using the fact that there are no edges from S* to T\T Uk that ((Af]Q) x (BnP))(lE c Uto E{H^ +1 ), 
and hence 

fe-i 

|(L4 nQ)x(BnP))n£*|<^] \e{h^ +1 ) c\E*\. (l) 

Combining these estimates, we get that the size of the matching output by the algorithm is bounded by 



fc-i 



i=0 



k-l k-1 



+ \T Uk \ + \Z Ul \ + \ E ( H T +1 ) n E*l 

i=0 i=0 

Recall that \Si\ = (1 + 0(S))(l - l/k)' k+i and \Sf k \ = (1 + 0{5))n/k by property (4). Thus, the first term 
is at most 

(1 + 0(S)) - l/k)- k +* - 1^ n/k = (1 + 0(6)) ((1 - l/fc)-*l=^=^ - fc) „/* 

= (l + 0(,5))((l-lA)' fe -2)n. 

Recalling that T" fc = (1 + 0(<5))n by property (4) and \Z Ui \ = 0(k5)n by property (2) completes the proof. 

■ 

We now show that no small space algorithm that is correct with probability at least 1/2 can output more 
than a vanishingly small fraction of edges in Uto E(H™ 1+1 ). Recall that the vectors of bits flipped in the 
subsampling process that correspond to vertices (and their edge neighborhoods) in H^ t+1 are denoted by bj. 

Lemma 14 Let X denote the distribution on input graphs obtained from a (d,k, 5)-packing for constant k 
and 5 = 0(l/fc 3 ). Let Abe a o(nd) space algorithm that is correct with probability at least 1/2. Then for 
each i = 0, . . . , k — 1 the expected number of edges in -E((Ji=o H^ l+1 ) retained by A conditional on A being 
correct is o(n). 

Proof: 

We give the algorithm the following information for free. At the end of phase i the algorithm knows all 
vectors m, . . . , Uj on the path chosen in the distribution (of course, the algorithm does not know Uj + i). This 
only makes the algorithm more powerful. 

Let Qi denote the set of phase i graphs, i.e. the set of possible graphs on the vertices T Ui U Si. Since 
the algorithm knows all vectors m, . . . , Uj, these graphs are solely determined by the choices made in the 
subsampling process in Hf for each w. Denote the state of the memory of the algorithm after z-th phase for 
i = 0, . . . , k — 1 by mj. For each i between and fc-lwe denote the function that maps mj_i and the graph 
Gi = (T u \Si, Ei) G Qi to mi by : {0, 1} S x Qi — > {0, 1} S , where s is the number of bits of space that the 
algorithm uses. Wlog assume m_i = 0. 

Denote by E* the set of edges that the algorithm outputs at the end of the stream. Denote the event that 
the algorithm is correct by C. Let E* := E* D (Si x Q). Let Mj G {0, 1} S denote the (random) state of the 
memory of the algorithm at the end of phase i. LetV := {\E*\ = n(n)} AC and T>i = {\E*\ = tt(l/k)n}AC. 

We prove the lemma by contradiction. Suppose that conditional on being correct, the algorithm retains 
Q(n) edges of Ui=o E(H™ t+1 ). Then a simple averaging argument using the assumption that Pr[C] > 1/2 
shows that Pr[X>] = S7(l) and there exists j £ [0 : k - 1] such that Pr[Vj] > C/k for a constant C > 0. 
We will now concentrate on phase j. Denote the set of good memory configurations by G = {(mj^i,mj) G 
{0, 1} S : Pr[Pj|Mf_i = mj_i,Mj = nij] > C/(2k))}. Thus, G is a set of memory configurations in the 
j — 1-st and j-th phases such that conditional on (M 3 -_i, Mj) G G the algorithm is likely to output a lot of 

edges of Uto^r +1 )- Then 

Pr[(M i „i,M i ) G G] + (C/(2k))Pr[(Mj_ 1 ,M j ) G] > Pr[Vj] > C/k, 
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so 

Pr[(Mf_i,Mf) G G] > C/(2k). (2) 

Before proceeding, we prove an auxiliary lemma. Recall that in the definition of a (d, k, 5)-packing for 
all d children w of Ui the graph Hf is obtained from Hf by keeping edges incident to a uniformly random 

subset of a 1 — 5 fraction of nodes in Xf. Thus, there are at least (^Ji) = 2 graphs in Qi, where r\ > 
is a constant. The following claim follows similarly to [8 ]. We give a proof here for completeness. 

Claim 15 Let a > be a constant and let F be any subset of ' Q{. Let Gf denote a set of edges that are 
contained in at least 1/2 of the graphs in F. Let J C [1 : d] be the set of indices such that Gf contains at 
least a\Xf\ edges from Hf_ v where w is the j-th child of Ui-\, for each j G J. Then if\F\ > 2W"~°( 1 ))* 1 , 
\J\=o(d). 

Proof: Let \J\ = d\. Recall that by property (1) the maximum degree in Hf is bounded above by c := 
(1 + 0(5))~jk. Thus, the number of graphs that can be in F is bounded by 



/(l - a/c)\Xf \\ dl ( \Xf\ \ d dl = f^-n[\xf\) ( \ X T\ \ \ dl ( \ X T\ \ d dl = ^n{d in ) 2V dn 
\ 8\X?\ ) \8\Xf\) ~\ \5\Xf\)) \5\Xf\) 

It then follows that if d\ = Q(d), we have |F| < 2^ n ^'> dn , contradicting our assumption on the size of 

F. m 

Let £j denote the event that {Mj)\ > 2^ n ~°^' dn . A simple counting argument shows that for a 

uniformly random graph H G Qj we have Pr[£j] = o(l) (here we use the fact that coin flips that determine 
which edges belong to Hf_ x are independent of Mj-_i). Combining this with (O, we get 

Pr[(M,-_i, Mj) G] + Pr[Sj] < 1 - C/(2k) + o(l) < 1. 

Thus, there exists m*_ 1 ,m* G {0, 1} S such that the following properties hold 

(PI) Pr'P ; .\/ ; : = m^Mj = m*) > C/k; 

(P2) (m*)| > 2^-°^ dn . 

3-1 J 

We can now complete the proof. For brevity let M. = {Mj_i = iti*_ 1: Mj = m*}. Recall that E*- is the 
set of edges from Hj 3+1 that the algorithm outputs at the end of the stream. We have 

E E; [Pr[Dj\M}} > C/k, 

and so there exists E* such that Pr[Vj\M A E* = E*] > C/k. 

Now recall that T>j = {\Ej \ = 0,(1/ k)n} A C. Thus, we have isolated memory configurations 

and m*j and a set of edges Ej of size 0,(1/ k)n such that the algorithm can output Ej and be correct with 
probability at least C/k conditional on Mj-% = fn*j-i and Mj = m*j\ 

Finally, note that conditional on M. A {E* = E* } all graphs H G t/> i ' 1 (m*) are equiprobable. Now using 
property P2 above together with Claim[l5]we conclude that \E*\ = o(n), which is a contradiction. 



We can now give 

Proof of Theorem [Tit The proof of Theorem [TT] now follows by combining Claim [l2j Lemma [13] and 
Lemma [T4l after setting 5 = cS' /k 2 for a small constant c > 0. ■ 
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3 Construction of a (d, k, 5) -packing 



In this section we give a construction of a (d, k, 5)-packing on O(n) nodes with d = n V lo g lo s"7 for any 
constant k and sufficiently small constant 5 > 0. Our construction will use many of the techniques introduced 
in Q and (the full version of) [0. 

We first introduce notation. As before, the sides of the bipartition of the graph G{p) that we need to 
construct are denoted by T and S = So U . . . U Sf.. We use the notation [a] = {1, . . . , a} for integer a > 1. 
In our construction the T = T° side of the graph is identified with a hypercube [m 4 ] m for a value of m to 
be chosen later, and the sets Si, i = 0, . . . , k — 1 are identified with a subsampled version of the hypercube 
[m 4 ] m . The vertices of the last set Sjt do not have any special structure. Vertices x G T or y G Si will often 
be treated as points x, y G [m 4 ] m . Each node u of T (except the root) will be labeled with a binary vector 
u G {0, l} m . We will write |u| to denote the Hamming weight of u. For x G T and u G T we use the dot 
product notation (x, u) = YliLi x i ' u i £ ^- F° r an interval [a, 6], where a, 6 are integers, and an integer 
number W we will write [a,b] -W to denote the interval [a-W,b- W]. Finally, for an integer i and an integer 
W we will write i mod W to denote the residue of i modulo W that belongs to [0, W — 1]. 

For convenience of the reader, we first give an informal outline of the construction. Given a path p = 
(uq, u\ , . . . , Uk) from the root of T to a uniformly random leaf, we construct the packing as follows. First, we 
associate with each node of T other than the root a subset of {0, l} m (i.e. a binary vector) from a family of 
subsets of fixed cardinality and with small intersections. Since the subsets corresponding to nodes of T have 
small intersections, one can think of them as nearly orthogonal vectors. 

We then traverse the path p from the root to the leaf and at step i, i = 0, . . . , k — 1 we essentially se{j] 

T Ui+1 := {x G T Ul : (x, u m ) mod W G [1/k, 1] • W}, 

where W is an appropriately chosen parameter. Thus, traversing a root to leaf path amounts to repeatedly 
cutting the hypercube with hyperplanes whose normal vectors are almost orthogonal. At step i the set Si 
is identified with an appropriately subsampled copy of T Ui , and a Ruzsa-Szemeredi graph is constructed on 
(T u % Si). At step i, besides defining the new set T Ui+1 , the vector Uj + i (corresponding to the next vertex on 
the path) is used to define a subset C S^ for all j < i by similarly cutting Sj' 1 with a hyperplane. The 

most important property of our construction will be the fact that when we reach the leaf u^, most of the edges 
going out of Sj k for j = 0, . . . , k — 1 will be contained in T Uk , yielding property (2) of (d, k, <5)-packings. 
We note that the idea of using nearly orthogonal vectors to construct Ruzsa-Szemeredi graphs was introduced 
in Q and further generalized in O, so this part of our construction adapts known techniques to our setting. 
Our main contribution here is the approach of constructing a recursive sequence of graphs by cutting the 
hypercube by nearly orthogonal hyperplanes, which allows us to derive property (2). 

We now give the details of the construction. We will use the following lemma from [8], which is a 
convenient formulation of the construction of error correcting codes with fixed weight in [ 12] 

Lemma 16 /H]/ For sufficiently large m > 0, any constant e G (0, 1) and constant 7 G (0, 2) there exists a 
family T of subsets of [m] of size em with intersection at most r ye 2 m such that ^ log \ J-\ > c e , 7 — o(l). 

Our main lemma is 

n( 1 ) 

Lemma 17 For any constants k, 5' > there exists a (d, k, 5 )-packing on @(n) nodes with d = n V^g^s"/ . 

Proof: We associate with each node of the d-ary tree T of height k a vector v from a family of almost 
orthogonal binary vectors of equal weight whose existence is guaranteed by Lemma[l6] Since the number of 
nodes in such a tree is at most d k+l , we can afford to set d = 2 r2 ( m ) since k is constant. Besides associating 

'This statement is slightly imprecise in the interest of clarity. 
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with each node u 6 T a vector u, we also associate with u a random variable U u that is uniformly distributed 
over the integers between and W — 1, where W is a parameter that will be chosen later. The variables U u 
and U u i are independent for u / u' '. 



Let X' = Y 



in 



Aim 



for some integer m > 0. Let X be a uniformly random subset of X' where each 



point of X' appears independently with probability 1/k. We will refer to vertices in X and Y as points in 
[m ] m . We now specify how a graph satisfying the properties in definition [9] is constructed for a given path 
p = (uq , m , . . . , Wfc) denote a path from the root of T to a leaf of T '. 

The path p induces a decomposition of the vertex set T as follows. For alH = 0, . . . , k — 1 

T n * = {y 6 y : (y, u,) mod W G [1/fc, 1) • W, for all j G [1 : i]} 
Si = {xeX' : (x, Uj) mod TV G [1/k, 1) • W, for all j = [1 : i]}. 

Also, let 

Sf = {x G S,- : (a?, u/) mod W G [1/jfe, 1) • W, for all / G [1 : »]}, for aU j = 0, ...,%- 1 

The set S'fc is a disjoint set of vertices connected to T" fc by a perfect matching. 
Consider fixed i between and k — 1. For all children w of let 



= { y g T Ul 
W ry (u>) = {y G T Ml 
J B y (u;) = {y G T Ui 



((y,w) + C/ w ) mod W & [0,1/k] ■ W} 

((y, w) + U w ) mod G ([1/A;, 1/k + 5] U [1 - 5, 1)) • 

((y,w) + U w ) modW e [1/k + 6,1-6] -W} 



Define R x (w), W x (w), B x (w) similarly (note that these sets are defined only for Si): 

R x (w) = {x€ S, t 
W X (w) = {x G S, 
B x (w) = {x€ 5, 



((x,w) + U w ) mod G [0, l//c] • W} 
({x,w) + U w ) mod TV G ([1/fc, 1/fc + <5] U [1 - ^ 1)) • W} 
((x,w) + U w ) mod W G [1/k + 6, 1 - 6} ■ W} 



(3) 



(4) 



(5) 



(6) 



We note here that the random shift U w is not necessary for most properties that we establish, and will only 
be useful to establishing property (3). First, we analyze 



Size of the sets T u > , Sj ,Sj',R,B,W and property (4). We will need 

Claim 18 Let 6 > be a constant such that 1/5 and 6W/\w\ are integers, and let U G [0 : W — 1] be an 
integer. Define for q = 0, . . . , 1/6 — 1 

A q = \{y£Y: ((y, Uj ) + U) mod W G [6q, 6(q + 1)] • W\ . (7) 

Then \A q \ G (1 ± o(l))6\Y\. 

Proof: Consider the mapping tp : y — >• y — • uj. This is a well defined mapping into Y for all y G Y 
except those that have at least one coordinate smaller than f^, = Oil). We denote this set by R. But for 
any fixed I one has \{y G Y : y\ < ^} = m 4^.| = o(|y|/m 2 ), and hence by the union bound over all 
I = 1, . . . , m one has \R\ = o(\Y\). For all q = 1, . . . , 1/6 — 1 the mapping <fi maps A q injectively into A q -\, 
and Aq into J 4 1 / (5 _ 1 , everywhere except R. Thus, one has \A q \ = 5(1 ± o(l))|y|, and the conclusion of the 
lemma follows. ■ 
We first prove 
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Lemma 19 Consider any set S defined by S = {y G Y : (y,u) mod W G [a u , 6 U ] ■ W,u £ U}, where U 
is a collection of binary vectors and a u , b u are constants. Let v be a vector such that |u| = | v| for all u £U 
and max ug ^(u, v)/|v| < 5', and A,B G [0, 1], A < B are rational constants. Let 

S' = {yeS: (y, v) mod W € [A, B] • W}. 

Then for sufficiently large W = 0(m) one has \\S'\ - (B - A)\S\ \ = 0{\U\5'). 

Proof: Consider the mapping ifi v ,j '■ V ~ ► V ~ ^' s ^~^ w ■ v, where 5 is a sufficiently small rational constant 
such that 1 — (i? — ^4) is an integer multiple of 5(B — A). Note that the mapping is well-defined as long as 
W is an integer multiple of 1/ (5(B — A)), which is admissible under our assumption that W = 0(m). 
Let y G S. Then 

(My), u) = (y, u) + ^^ B - A ) W . (Uj v) < (l , u) + j ■ 6(B - A)W5', 

|v| 

so for \j\ < 1/(5(B — A)) maps points y G S into 5 unless either 

(y,u) modW€[a u ,a u + 5 / ]U[6 u -rf , ,6 u ]-W r (8) 

for at least one u £ U or y has at least one coordinate smaller than W. We call such points bad and denote 
this set by R. For a fixed u the fraction of y G Y that do not satisfy ([8]> is 0(#') by Claim [18] and hence by the 
union bound over all u G U we get that the fraction of such points in Y is 0(\U\S'). The fraction of points 
with at least one coordinate smaller than W is at most W/m 4 , and hence by the union bound the fraction of 
points with at least one coordinate smaller than W is o(l), so \R\ = 0{U5') ■ \Y\. 
Similarly to ClaimQIO define 

A q = \{y G S : (y, v) mod W G [(B — A)5q, (B - A)S(q + 1)] • W\ . (9) 

Now let D = [0 : rgzTASs ) ^ enote tne set °f indices such that 5 = UdeD Af> and let D' = [ re^]g : 7b^4u] 
denote the set of indices such that S' = [j^eD' ^d- 

Define a bipartite graph F = (S', S \ S', Ep) by including an edge (x,y),x G S',y G S\ S' to Ep 
whenever ip v j (x) = y for some j G D. Thus, each x G S'\R has degree \D\D'\ in F, and x G (S\S')\R 
have degree |_D'| in F. Furthermore, the degree of each x G S' is bounded by |D \ D'| and the degree of each 
x G S \ S' is bounded by | D' | . 

Putting these estimates together, we have \S' \R\ • \D\ D'\ < \S \ S'\ • \D'\, i.e. 

151 < (W - 151) ■ + |H| - - ■ ^Ajj + W . 

Thus, < (B-A)-\S\+(1-(B-A))\R\. On the other hand, we also have < 

|5\5l<| 5 1.^ + | fl | H5l .i^) + | B | 

Thus, (B - A)(\S\ - \S'\) < \S'\ ■ (1 - {B - A)) + (B - A)\R\, so > (B - A)\S\ - (B - A)\R\. The 
conclusion of the lemma follows. ■ 
Estimates on the size of sets T Ui now follow by noting that one has \U\ < k in all cases, and that the 
maximum dot product 5' can be chosen to be l/poly(/c). The bounds on the size of S°',R,B,W follow m a 
similar way with the additional application of Chernoff bounds to the sampling of points that are included in 
X'. 

We now define the edges of the ((k — 1)7, £7, 0(5))-almost regular induced subgraph Hf, for a constant 
7 > (the induced property will be shown later). The subgraph Hf will consist of disjoint copies of small 
complete bipartite graphs. 
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Constructing Hf. Fix a child w of -Uj. For the purposes of constructing i?™ we condition on the values of 
all shifts U w . In what follows we omit the parameter w when referring to sets R Y (w), W Y (w), B Y (w). For 
two vertices b, b' G R Y such that \(b — b', w)| < W/A; we say that b ~ b' if 6 — 6' = A • w for some A. Note 

that we have AG — ^j^r , ^P^t . We write B b C F to denote the equivalence class of 6. It follows directly 

from the definition of B b and © that \B b \ = W/{k\w\) for all b. Also, let 



u 

>Ae[o,(i-i/fc)w/|w| 



(B b + A • w) 



Note that is a random set (determined by the random choice of X C X'). Since each element of X' is 
included in X independently with probability 1/k, we have that E[\A b \] = (1 ± 0(5))(1 - 1/A;)|0&|. 

We now define a set of edges of a ((k — 1)7, £7, 5)-almost regular subgraph between (a subset of) B b 
and A b . First note that E[|jB(,|] = (1 ± 0(S))(1 — l/k)\A b \. Furthermore, since X is obtained from X' by 
independent sampling at rate 1/k, standard concentration inequalities yield 



Pr t (1 ± <5)(1 - 1/A:)|£„|] < e -5 2 (V2)|B.|/4 < ^2 



(10) 



for I ^ I > 7 = 161n(8/(5)/5 2 . To ensure this, it is sufficient to ensure that W > 16fc M 8 / <5 ) • |w|. Wenotehere 
that we are thinking of 5 as being smaller than 1/k. In particular, we will set 5 = 0(l/poly(fc)) at the end 
of the construction. Now for each c G Ab, d G B b include an edge (c, d) in Hf. We will define a complete 
bipartite graph on each such equivalence class Ab, B b , i.e. for each c G Ab, d G B b include an edge (c, d) in 
Hf. However, since we used randomness to chose the set X', some of these classes may be too small due to 
stochastic fluctuations. We deal with this problem next. 

We now classify points b G R Y as good or bad depending on the how close \B b \ is to its expectation. 
In particular, mark a b bad if \B b \ (1 ± 5)(1 — l/k)\A b \ and good otherwise. Note that in fact this is a 
well-defined property of an equivalence class. Let Jg denote the indicator random variable that equals 1 if 
B is bad and otherwise, where B is an equivalence class. Note that Jg is independent of Jg/ for B ^ B', 
since J is determined by the random choice of X C X' and we are conditioning on the values of all shifts 
U w ,w G T. By (fTOl one has E[Jg] < 5 2 for all equivalence classes B. Note that each equivalence class 
contains a constant number of points, and hence there are $7(m 4m ) equivalence classes for every i and w child 

Of Ui. 

An application of Chernoff bounds shows that for fixed i and fixed w a child of Ui 



Pr 



Jb > 2E 



I B 



< e 



-a(m 4m ) 



(11) 



Note that by (flOl one has that (TTTTt bounds the probability of there being more than 25 2 fraction of bad classes 
for fixed w G T. Taking a union bound over 

20M nodes of j 

we conclude that there will be no more than 
25 2 fraction of bad equivalence classes in Hf for any i, and w a child of Ui. 

If b is good, let A' b denote an arbitrary subset of A b of cardinality (1 — 5)(1 — l/k)\B b \. Similarly, let 
B' b denote an arbitrary subset of B b of cardinality (1 — <5)|23b|» so that \A' b \ = (1 — l/k)\B' b \. Now for each 
c G A' b , d G B' b include an edge (c, d) in Hf. Note that each such graph is a ((k — 1)7, Icy, 5)-almost regular 
graph, as required by property (1). Note that all matched edges are of the form (c, d), where 



c = d-A-w,AG (0,W/|w| 



(12) 



The union of the small complete graphs that we constructed yields the graph Hf for a fixed child w of 
Ui. We also showed that on such graph Hf contains more than a 25 2 fraction of bad classes whp, which 
completes the construction of the graphs Hf. 
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Induced property (property (1)). Graphs Hf constructed in this way are induced for the same reason as 
in Q HI when the vectors w, w' corresponding to two distinct nodes of T are chosen in such a way that 
|w| = |w'| = em (recall that X' = Y = [m 4 ] m ) and 

(w,w') < (5/2)e|w| (13) 

for sufficiently small constant e. Indeed, consider a fixed i and suppose that an edge (a, b) G E(Hf) is 
induced by Hf' for w' / w. But then it must be that either c G R Y (w'),d G B x (w') or d G R Y (w'),c G 
B x (w'). In either case one has 

\(c-d,W)\>5-W. (14) 
However, by (PT31 together with ([121 one has 

|(c-d,w')| < 1 — rCw.w 7 ) < rr( 5 /2)e|w| = (5/2)eW, 
which is a contradiction with (fT4l for e < (5/10. 



Existence of a large matching (property (3)) We now show that for any i and w a child of Ui there exists 
a matching of 1 — 0(6) fraction of Si to T Ui \ T w . We will do this by exhibiting a fractional matching of 
appropriate size. 

Consider a point x G T u \ We need to analyze the degree of x in the graph T Ul U S{. Note that the degree 
of x depends on (1) the number of vectors w for which x G R Y (w) and (2) on the size of the equivalence 
classes that x belongs to for different w. We first analyze (1). 

For a fixed w it follows by Claim[H]and the definition of U w that Pr Uw [x G R Y (w)] G (1 ± o(l))|. Next 
note that each vertex x G R Y (w) has degree (k — 1)7 in Hf. Furthermore, since the random shifts U w are 
independent for different w, we obtain using Chernoff bounds that for a fixed x G T Ui 



Pr 



id child of Ui 



< e 



-n{5 2 d/k) 



(15) 



A similar argument shows that the expected degree of each vertex in Si \ Sf has similar concentration around 
k^yd. Since there are only 0(m 4m ) vertices and 2°( m ) nodes in the tree T, and d = 2 n ( m \ a union bound 
shows that vertex degrees are concentrated in each T Ui , Si pair with high probability. Now it remains to handle 
the loss of edges due to x G T Ui belonging to small equivalence classes for some w. However, it follows from 
the analysis in (fTTT) that at most an 0(S 2 ) fraction of the edge mass can be lost because of this, yielding the 
following fractional matching. Put weight 1/(^7) on each edge in Hf, and put weight 



(l+0(«5))Jfc(l-l/fc)7d ° n 

each edge going from T Ul \ T w to Si \ Sf. Since degrees in T Ui are bounded by (1 + 0(5)) (1 - l/k)jd, and 
degrees is Si are bounded by (l + 0(6))k~fd, this is feasible and yields a matching of size (1 — 0(5 + 5 2 ))\Si\, 
proving property (3). 

We now prove property (2). For i = 0, . . . , A; — 1 let 



Z Ui = {y G Y '. (y, uj) mod W G ([1/k - S, 1/Jfe] U [0, 5]) ■ W for some j G [1 : k]}. 



(16) 



We need to show that the subgraph H* induced by (T Ui \ (T Uk U Z Ui )) U S^ k only contains the edges of 
H^ +1 . First note that if an edge (c,d),c G P,d G Q belongs to H*, then c G S^ k and d G T u \ so (c, d) 
necessarily belongs to some graph Hf, where w is a child of m. Then we have by (fT2l that 



d — c = q ■ w, where |g| < W/|w|. 
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On the other hand, we have for all j = 1, . . . , k using the orthogonality condition (fT3T > 

W 

\(c-d,Uj)\ < ^|(w,u,)| < (5/2)eW. (17) 
w 

Now recall that a G £% k , so by @ and © 

(c, Uj) mod VP G [1/Jfe, 1] • W, Vj = 1, . . . , k. 

Thus, by (fT71) one has 

(d,u,-) mod G ([l/k -6,1} U [0,5]) • TY,Vj < k, 

i.e. d G Z Ui U T Ufe , if we set e to smaller than 5/10. 

It remains to bound the size of Z Ui . First note that it follows from Claim [18] that for sufficiently small 
constant 5 (e.g. 5 < l/k 2 ) one has 

\{y GK : (y,Uj) mod W G ([l/k — 5, l/k] U [0, 6]) ■ W\ < 25\Y\. (18) 

Now by a union bound over all j G [1 : k] we conclude that < 25/c|Y| = 0(5kn). 

It remains to set parameters. First, inspection of the bounds obtained so far reveals that setting 5 = cS' /k 
for a sufficiently small constant c > is sufficient to obtain a (d, k, 5')-packing, where we set e = 5/10. 
Finally, the size of the graphs obtained is essentially the same as in [7] and [8]. In particular, the number of 

vertices is n = 0(m 4m ) and d = 2^( m ). Thus, we get a graph on n vertices with d = n r2 ( 1 °s 1 °g' 1 ) edges. ■ 
Proof of Theorem Q} The proof follows by combining Theorem QT] and Lemma [T71 ■ 



4 Multipass approximation for matchings 

In this section we present the basic version of our algorithm for approximating matchings in multiple passes 
in the vertex arrival setting. Let G = (P,Q,E) denote a bipartite graph. We assume that vertices in P 
arrive in the stream together with all their edges. At each step the algorithm maintains a fractional matching 
{/ e } eg E, where the capacity of each vertex in Q is infinite and the capacity of each vertex u G P is equal 
to the number of times it has appeared in so far (i.e. always between 1 and k). The capacity of an edge 
e = (u, v), u G P, v G Q is equal to the capacity of u. For a vertex uePwe write S(u) to denote the set of 
neighbors of u in G. 

4.1 Algorithm 

We now give the algorithm and show how to implement each pass in linear time. 
Algorithm 1: PROCESS-VERTEX(G, u, 5{u)) 

1: Augment capacity of u and all edges in 5(u) by 1. 
2: WATER-FILLING(G', u, 5(n)) 
3: REMOVE-CYCLES(G", /). 

The function WATER-FILLING(G', u, 5(u)) increases the load of the least loaded neighbors of u simul- 
taneously (with other neighbors joining if the load reaches their level) until one unit of water is dispensed 
out of u. Here the support of the fractional matching {f e } eG E maintained by the algorithm is denoted by G' . 
The function REMOVE-CYCLES^', /) reroutes flow among cycles that could have emerged in the process, 
ensuring that the flow is supported on at most \P\ + \Q\ — 1 edges. We note that as stated, Algorithm Q] does 
not necessarily take 0(m) time per pass due to the runtime of cycle removal. However, simply buffering 
incoming vertices until the number of edges received is ®(n) and only then removing cycles yields a linear 
time implementation. Here we can use DFS to reroute flow along cycles in time linear in the number of nodes. 
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Remark 20 We note that a single pass of this algorithm is different from the one-pass algorithm that achieves 
1 — 1/e approximation from [8 /. However, we will later show that our algorithm in fact also achieves the 
ratio ofl — 1/e in a single pass. 

We now turn to analyzing the approximation ratio. We first give a sketch of the proof under additional 
assumptions on the graph G, and then proceed to give the relevant definitions and the complete argument. 



4.2 Analysis for a simple case 

We now assume that G = (P,Q,E) has a perfect matching M. For each k > 1 and all x > denote 
by b k {x) the number of vertices in Q that have load at least x after k passes. We start by pointing out 
some useful properties of the function b k {x). First, note that b k (0) = \M\, b k (x) is non-increasing in x and 
b k {x) — b k ~ 1 {x) > for all x. Furthermore, we have 

oo 

b k (x)dx = k\M\, (19) 



o 

since every vertex u G P contributed 1 unit of water, amounting to \M\ amount of water overall, and ([191 
calculates the sum of loads on all v G Q. Furthermore, note that the size of the matching constructed by the 
algorithm after k passes is exactly equal to 

\ [ b k (x)dx, (20) 
k Jo 

since every vertex v G Q with load x contributed i • min{fe, x} to the matching. Hence the approximation 
ratio after k passes is at least 

1 f°° 

1 - - / b k (x)dx, (21) 
k J k 

where we used (fl9l to convert (l20l into (l2"TT i. Thus, it is sufficient to lower bound f Q b k (x)dx in order to 
analyze the approximation ratio, and we turn to bounding this quantity. 

First consider the case k = 1. Fix x > and consider vertices v G Q that have load at least x - there are 
at least J°° b 1 (s)ds of them. For each such vertex u consider its match M(u). Since u ended up at level at 
least x after the first pass, its match M(u) must have been at level at least x when u arrived, and levels are 
monotone increasing. Hence, we have 

/>oo 

b l {x) > / b 1 (s)ds (22) 



for all x > 0. This, however, together with (fT9l can be shown to imply that f°° b 1 (s)ds < \M\ ■ e~ x for all 
x. We immediately get using (|2T1 ) that the approximation ratio after one pass is at least 1 — 1/e. 

Now suppose that k > 1 and consider vertices v G Q that are at level at least x after k-th pass, but were 
at a lower level after (k — l)-st pass. There are exactly b k (x) — b k ~ 1 (x) such vertices. Since these vertices u 
were at level at least x after k-th pass, their matches M(u) must have also been at level at least x when they 
arrived, implying that 

poo 

b k (x) > / (b k (s)-b k -\s))ds (23) 



for all x > 0. Solving (1231 . we get that for all k > 1 

fOO /'OO 

/ b k (s)ds <\M\- F k (s)ds, (24) 

J X J X 

where 1— F k (x) is the cdf of the Gamma distribution with scale 1 and shape k, i.e. F k {x) = J°° e~ s s k ~ l / {k— 
l)\ds. Using this in ([2T]) yields the desired bound on the approximation ratio, i.e. 1 — e~ k k k ^ 1 jk\. 
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4.3 General case 

The proof sketch we gave in the previous subsection works under the assumption that G has a perfect match- 
ing. The general case turns out to be substantially more involved. Interestingly, while the analysis above 
proceeds by showing that not too much mass will be in the tail b k (x)dx, here we find it more convenient 

to show that substantial mass will be in the head of the distribution, i.e. bound b k (x)dx from below. We 
extend the argument using a careful reweighting of vertices and scaling of levels guided by the structure of 
the canonical decomposition of G introduced in O, which we now define. 

Let G = (P, Q, E) denote a bipartite graph. For a set S C P we denote the set of neighbors of S by 
T(S'). For a number a > the graph G is said to have vertex expansion at least a if > a\S\ for all 

Sep. 

Definition 21 (Canonical decomposition) Let G = (P, Q, E) denote a bipartite graph. A partition of Q = 
Ujex^j'^j ^T-i = 0, j ^ i and P = UjeJ Sji Sj ^ &i = 3 / * together with numbers ctj > 0, where 
ctj < I for j < and aj > I for j > is called a canonical partition if 

1. for all i one has T (\J jeI j<i Sj^j C \Jjez,j<i T f< 

2. \T(S) n Tj\ > aj \S\forall S C Sj for all j G 1; 

3 - \ T j\/\ s j\ = a jJ G 2T- 
//ere IcZija ie? o/ indices. 
Please see Fig. [3] for an illustration. 

Remark 22 For A; = 1, f/ze analysis is inspired by the analysis of the round-robin algorithm in H15\l . We note 
that the difference in our case is that we essentially consider a fractional version of their process, and obtain 
significantly better bounds on the quality of approximation. In particular, the best approximation factor that 
follows from the result of H15\l is 1/8 even after any k passes, while here we get the optimal 1 — 1/e factor for 
k = 1, and an approximation of the form 1 — 0{l/k 1 ' 2 ) for all k > 0. 
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We now introduce some definitions. For a node v G Q let l k (v) denote the load of v after the A>th pass. 
Note that l k (v) > and may in general grow with \P\ for the most loaded vertices in Q. The core of our 
analysis will consist of bounding the distribution of water levels among vertices in Q, showing that there 
cannot be too many highly overloaded vertices. It will be convenient to assume that water is allocated in 
multiples of some A org > (such a A org always exists since we are dealing with a finite process). 

Shadow allocation and density function 4> k (x). First, define 

(Source capacities) Define w s (u),u G P by setting w s {u) = min{l,aj} for u G Sj. Note that one has 
SneP w s{ u ) = Also, for v G Tj let w s (v) := min{l, ay} for convenience. 

(Sink capacities) Define wt(v),v G Q by setting Wt(v) = min{l, 1/ay} for v G Tj. Note that one has 
^2 V £Q wt(v) = \M\. Also, for u G 5j let wt(u) := min{l, 1/ay} for convenience. 

We will use the concept of a shadow allocation, in which whenever a units of water are added to a vertex 
v G Q in the original allocation, a/wt(v) units of water are added to v in the shadow allocation. Now 
whenever water from a vertex u G P is added to vertex u G Q at level x during the j-th pass in the shadow 
allocation, we let <fti(x) := w s (u), where is the density function. It will be crucial that 

/■oo 

y>t(v) / 0i(x)dx = |M| (25) 

for all j = 1, ... , fc. We assume that water in the shadow allocation is allocated in multiples of some A > 0. 
Then 

Lemma 23 One has for all x > and all k > 1 

/•oo 

b k {x) > / ^2wt(v)^s)d8. 

Proof: Recall that the pairs in the canonical decomposition of G are denoted by (Sj,Tj), where the expansion 
factors Qj are increasing with j. We need to that 

/•oo 

b k {x) > / J2 w t( v y$(s)ds (26) 
Jx vgq 

By definition of the canonical decomposition (Sj,Tj)j £ x for each j G T there exists a (possibly frac- 
tional) matching Mj in G that matches each u G Sj exactly ay times and each u G Tj exactly once. Let 
Mj(u, v) G [0, 1] denote the extent to which u is matched to v, so that X^eT Mj(u, v) = aj for all u G Sj 

and E« e s- M i( u ' u ) = 1 for a11 w G T i- 

Consider a node u G Sj and suppose that a A or g amount of its water was allocated to level [z* A, (z-t-1)* Aj 
of a vertex v G T r in the original allocation. Note that r < j since there are no edges from Sj to T r , r > j. 
By the definition of the shadow allocation A or9 amount of water in the original allocation corresponds to 
water placed contiguously in the shadow allocation. Let t := J^)./\ an d let A • j, . . . , A • (j + t — 1) 
denote the t contiguous levels that this water occupies in the shadow allocation. 

By definition of the water-filling algorithm all neighbors w of u must have been at level at least (i + 
l)A org > (i+l)A org when the node u was allocated since wt(v) = min{l, l/a r } > u>t(u) = min{l, l/ctj}. 

Thus, we have that for each such u G Sj 

ki 



contribution to rhs of (1261 = A org ■ 4> v (s) 
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since the A org amount of water corresponds to t = slabs of size A, and the contribution is then weighted 
by wt(v) in the rhs of d26l ). We now calculate the contribution of u G Sj to the lhs. We let each Uj G Sj 
contribute Mj(u, w) to each w G Tj, so that the total contribution to each w is 1 and total contribution of each 
u G Sj is aj. Thus, 

contribution of u to lhs of (1261 ) > A org • min{l, 1/a.,} • ctj 

since u has ay matches in Tj, whose contributions are weighted by min{l, l/ctj}. As before, it remains to 
note that min{l, l/ctj} ■ aj = min{l, ctj} = ^{s). ■ 
We now get 

Lemma 24 One has for all x > and all k > 1 

|M| - 6 fe (:r) < /V'(s) " b^is^ds. 

Proof: By Lemma |23l we have 

b k (x) > 



o 



poo 

b k (x)> / J> t (<,)<^( S )d S 

Putting this together with d25T ) we get 

\M\-b k {x)<[ y2w t {v)(/) k (s)ds 

for all x > and fc > 1. To complete the proof, we note that 



f V w t {v)${s)ds < [ X (b k (s) - b^is^ds 

Jo 



for all k > 1 and x > 0, where we let b° = for convenience. 
We also need 

Lemma 25 Algorithm\I\constructs a matching of size at least 



k 

b k {x)dx. 



o 



Proof: A vertex v G Q contributes ^ minjfc, l k (v )} > u>t(u) | min{&;, l k (v)/wt(v )} to the matching, imply- 
ing that the size of the constructed matching is at least 



k 

b k (x)dx. 



n 



We now prove lower bounds on b k (x). Recall that for integer k > 1 
Note that 1 — F k (x) is the cdf of the Gamma distribution with scale 1 and shape k. 



k-l 

(27) 

i=0 
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Lemma 26 For every k > 1 one has for all x > 

px ex 

/ b k (s)ds > \M\ ■ / F k (s)ds. 
Jo Jo 

Proof: 

We prove the lemma by induction on k. 
Base: k = 1 This follows immediately since by Lemma l24l one has 

/ 6 x (s)ds> iMl-fcV). (28) 
jo 

Letting /(jc) = b^a)^, we get that f'(x) > \M\ - f{x) for all x > 0, /(0) = and f'(0) = \M\, 
which implies that f(x) > \M\ • (1 — e~ x ), as required. 

Inductive step: k — 1 — > k We need to prove that 

f b k (s)ds > \M\ ■ [ F k (s)ds. (29) 
Jo Jo 

By Lemma l24l for all x > 

b k {x)>\M\- (b k (s) -b k -\s))ds>\M\- b k (s)ds + \M\- F k - 1 (s)ds, (30) 



10 Jo Jo 

where we used the inductive hypothesis to replace J* 1 b k ~ 1 (s)ds with |M| • f Q x F k ~ 1 (s)ds. 
Thus, 

/ 6 fc (s)(is > |M| - b k (x) + \M\ ■ / F k ~ 1 (s)ds. (31) 
Let f(x) = /* b k (s)ds. We have from @D that 

/'(x) = |M | - /(x) + \M\ ■ f F k - 1 (s)ds, /(0) = 0, /'(0) = \M\. 

Jo 

Thus, /'(x) is given by the solution of 

g(x) = -g'(x) + \M\ ■ F k ~ 1 (x), g(0) = \M\. (32) 
The solution of (l32l is given by 

g{x) = e~ x [\M\ £ e s F k - 1 {s)ds + \M\\ . (33) 

Calculating the integral in (l33l yields 

f*x fx foo -| /*x ^ — *^ 1 ^ 1 

/ e s F k - 1 (s)ds= e s —z k - 1 e - z dzds= Y-s j ds = Y-x 1 , (34) 

7o Jo Js (k-iy. Jo ^j! 

and hence g(x) = |M| • F k (x). Thus, J* 1 b k (s)ds > f(x) = \M\ ■ f Q x F k (s)ds as required. 

■ 

Given Lemma l26l we immediately obtain 
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Theorem 27 Algorithm\I\achieves a (1 — e k 7§zrm ) -approximation to maximum matchings in k passes over 
the input stream. 

Proof: The approximation ratio is at least 

-1 rk 1 rk I /'OO 

-/ b k (x)dx>- F k (x)dx = l-- F k (x)dx. 
k Jo k Jq k J k 

Recalling that F k (x) = Y^j=o e ~ Xx ^ /J- an d using integration by parts 

J e- x x j /j\dx= -e~ x x j /j!\™ + J e^x^ 1 /{j - l)\dx, 

we get 



fc-i fc-l 



Thus, 



/ F k (x)dx= / J2e- x xi/j\dx = J2(k-j)e- k ki/j\ 

Jk J k j=0 j=0 

fe-l fe-l 

= J2 e~ k kJ +1 /j\ - e~ k k J /(j ~ 1)! = e~ k k k /(k - 1)1 

3=0 j=l 



I r°° p -kuk-l i 

II + « 



(35) 



5 Gap-existence 

In this section we show how our techniques yield an efficient algorithm for Gap-existence, thereby proving 
Theorem @] Recall that we are given a graph G = (A, I, E) and integral budgets B a . Note that integral 
budgets can be simulated implicitly by creating B a copies of a for all a € A. For simplicity, this is the 
approach that we take. 

We now present a discretized version of Algorithm [T] We will explicitly maintain a subset I* C / of size 
0(|A|/e), relying on the following two oracles: 

1. an oracle LIST-NEIGHBORS(a, /*) that, given a node a £ A and a set I* outputs the set of nodes 
I** C I* that a is connected to; 

2. an oracle NEW-NEIGHBOR(a, I*) that, given any set I* C I, outputs any node i e I\I* that a is 
connected to or if all neighbors of a are in /* . 

Algorithm 2: DISCRETIZED-WATERFILLING(G, a, e, k) 



J* <- 

while exists a neighbor i of a in I* with level < (e/4)k do 

Allocate water to i until it is at level (e/A)k 
end while 

I* <r- I* U NEW-NEIGHBOR(a, I*) 
Perform water filling on neighbors in I* . 
REMOVE-CYCLES(G') 



First we prove 
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Lemma 28 The space used by Algorithm\2\is 0{\A\/e). 



Proof: Call a vertex saturated if the amount of water in it is at least ek. The number of saturated vertices is 
0(|j4|/e) since there are k\A\ units of water in the system, and each saturated vertex accounts for at least ek. 
We say that an unsaturated vertex i belongs to a G A if i was added to /* when NEW-NEIGHBOR was called 
from a. Note that for each a G A only one j 6 7 belongs to a. Thus, this amounts to at most \A\ additional 
vertices. ■ 
Our algorithm for Gap-Existence is as follows: 

Algorithm 3: GAP-EXISTENCE(G, e) 



Run DISCRETIZED-WATERFILLING(G) with k = 0(log(|J| • Y,aeA B a) I (-)■ 
Let G denote the support of the fractional solution. 

Output YES if a complete matching with budgets [(1 — e)-E> a J exists in G", NO otherwise. 



We now assume that we are in the YES case and prove that the algorithm will find a matching with budgets 
[(1 — e)B a \ . We refer to vertices i £ I that have a nonzero amount of water as active. Let pi = 1 for active 
vertices and Pi = o.w. Abusing notation somewhat, for an active vertex i E I let l k (i) denote the level of 
water in i minus ek and otherwise. The Gap-Existence case is in fact somewhat simpler than the general 
case of approximating matchings that we just discussed, so we will use the more lightweight techniques from 
the analysis of the simple case for matchings. 

For each k > 1 and all x > denote by b k (x) the number of vertices in / that have load at least x+ek after 
k passes. We start by pointing out some useful properties of the function b k (x). First, note that b k (0) < \I\, 
b k (x) is non-increasing in x and b k {x) — b k ^ 1 {x) > for all x. Recall that we are interested in recovering 
a 1 — e/2-matching of the A side. To do that, we scale all allocations by 1 — e/2. The size of the matching 
recovered is 

poo 

(l-e/2)(e/4)fcV Pi + (l-e/2) / b k (x)dx = (1 - e/2)k\M\, (36) 

since every vertex a £ A contributed k units of water, one in each round, amounting to k\M\ amount of water 
overall, except for the water that was allocated below ek, and (l36l ) calculates the sum of loads on all i G I. 
Furthermore, note that the size of the matching constructed by the algorithm after k passes is at least 

i ,fc(l-e/4)/(l-e/2) 

(l- e /2)(e/4)J> i + - / b k (x)dx, (37) 

since every vertex i G I with load x contributes at least ^ • min{A;(l — e/4), x} to the matching before scaling, 
and hence ^ • min{A;(l — e/4)/(l — e/2), x} after scaling. Hence the approximation ratio after k passes is at 
least 

1 f°° 

1 - - / b k (x)dx, (38) 

K yfc(l-e/4)/(l-e/2) 

where we used (l36l to convert (l37T i into (l38l . Thus, it is sufficient to lower bound J Q fc b k (x)dx in order to 
analyze the approximation ratio, and we turn to bounding this quantity. 

Lemma 29 One has for all k > 1 

/"OO 

b k {x)> (b k (s)-b k - 1 (s))ds. (39) 

J X 

for all x > 0, where b° = 0. 
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Proof: For each such vertex a £ A consider its match M(a). If a ended up allocating water at level at least 
x during the k-th pass, its match M(a) must have been at level at least x when a arrived. Together with the 
fact that levels are monotone increasing this gives the result. We omit the details since they would essentially 
repeat the proof of Lemma [23] with minor changes due to the absence of weights wt on the I side. ■ 
We now get 

Lemma 30 For all k > 1 and all x > 

) rOO 

(40) 



/ b k (s)ds < |/| • / F k (s)ds. 

J X J X 



Proof: We prove the lemma by induction on k. 

Base: k = 1 We prove the statement by contradiction. Suppose that 



'x Jx 

for some xq > 0. Recall that by Lemma[29]one has 



rOO rOG 

/ b 1 (s)ds>\I\ e~ s ds = \I\e- X0 (41) 

An J xn 



b\x) > / b 1 (s)ds, (42) 



for all x > 0. Let g(x) = yf™ b 1 (s)dsj e x+x ° for x G [0. xp\. Then a(x) satisfies (l42l with equality. 

and hence > g(x) for all x G [0, xq}. But g(0) = (f™ b 1 (s)ds S j • e x ° > \I\, a contradiction with 
^(0) = |J|. 

Inductive step: k — 1 — >■ k We need to prove that 

/ 6 fc (s)ds < |/| • / F fc (s)ds. (43) 

Recall that by Lemma|29]for all x > 

b k (x) > / (6 fe (s) - b^is^ds = / 6 fc (s)^ - |/| ■ / F k ~ 1 (s)ds, 

J X J X J X 

where we used the inductive hypothesis to replace b k ~ 1 (s)ds with |/| • F k ~ 1 (s)ds. 
Fix any point xq > and denote 

7:= / 6 fc (s)ds. 



(44) 



XQ 

We will show that one necessarily has 7 > |/| • F k (s)ds. 

It now follows from (1441 that is lower bounded by the solution of 

POO 

g(x)= / (g(s)-\I\-F k -\s))ds 

J X 

Thus, <?(x) satisfies 



= -g(x) + |/| • F (x),g(xo) = 7. (45) 
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The solution of (1451 ) is given by 



g{x) = e~ x (-\I\ J e s F k -\s)ds + c^j , 



(46) 



where the constant c depends on 7. Note that g(0) = c, and recalling that g lower bounds b k (x), which 
is at most |/| at x = 0, we have that c < \I\. 

Calculating the integral in (l46l yields 

/ e s F k - 1 (s)ds= e s —z k - 1 e - z dzds = V^ = F (47) 

7o Jo Js (k-iy. Jo f^jl frijl 

and hence 

k 

g(x) = (c + I J| J] e" V/j!) = |/| • F fc (^) + (c - |/|). 
j'=i 

In particular, it follows that 7 = <?(xo) = |/| • F k (x) + (c — |J|) < |/| • F k {x), completing the proof 
of the inductive step. 

■ 

We now ready to prove correctness. Suppose that we are in the YES case, i.e. there exists a complete 
matching with budgets B a . Consider the fractional allocation returned by DISCRETIZED-WATERFILLING(G), 
and multiply it by (1 — e/2)/k. Recalling that each active vertex can take at least 1 — e/4 units of water, we 
get that every vertex in i £ I now contributes -j^rz^\ min{A:(l — e/4) /(l — e/2), l k (v)} to the matching. 

Thus, by Lemma [30] together with (138T ) shows that the amount of water lost is at most 



k 

We will need 

Lemma 31 For all k > 1 and e* > 

roo 



oc 



(l-e/2)|I|~ / b k (x)dx. (48) 

fc(l-e/4)/(l-e/2) 



1 f°° 

- / F fc (x)dx < e- e * fc (l + e*) fc • e- fc /t /c - 1 /(A; - 1)! 



fc(l+e*) 

Proof: Recalling that F k {x) = J2jZo &~ x & /j- an d using integration by parts 

e x x j /j\dx= [-e~ x x j /ji]™ + y e-V-VC/ - l)!dar, 
we get 

/ F k {x)dx= / yVV/j!da; = j)e~ k ( 1+e *\k(l + e*)) j /j\ 

Jk(l+e*) Jk(l+e*) j=0 j=Q 

fc-1 

< e" e * fc (l + e*) fc ^(yfc - j)e~ k k j /j\ = e" e * fc (l + e*) k ■ e~ k k k - x /{k - 1) 

j=0 



(49) 
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Let e* = (1 - e/4)/(l - e/2) - 1. By Lemma ED we have 



i / 6 fc (x)dx < e - fc ( e *- ln ( 1+e *)) • [e"***- 1 /^ - 1)!] < e"( e *) 2fc / 3 (50) 

for sufficiently small e* > 0. Also note that e/5 < e* < e for sufficiently small e > 0. Hence, letting 
k = -7 log(| J| - YlaeA Ba)/t 2 for a sufficiently large constant 7 > yields a (1 — (Y^ a eA Pa)~ 2 )-approximate 
fractional matching with budgets [(1 — t)B a \ ■ We now argue that the set of edges that this fractional matching 
is supported on admits a complete matching with budgets [(1 — t)B a \ ■ We will need 

Lemma 32 Let G = (P, Q, E) denote a bipartite graph. Suppose that there exists a fractional matching of 
size |-P|(1 — l^ 5 ) -2 ) in G. Then the support of the fractional matching contains a perfect matching of the \P\ 
side. 

Proof: Consider the subgraph G' that supports a fractional 1 — |-P|~ 2 matching. Recall that a graph supports 
an a-matching of the P-side iff \T(S)\ > a\S\ for all S C P. Now note that the ratio |r(5)|/|5| is a rational 
number of the form i/j where j < \P\. The existence of the fractional matching implies that |r(5)|/|5| > 
(1 — |-P| -2 ) for all 5CP. Since |r(5)|/|5| can only have denominator at most \P\, this implies that in fact 
|r(5)| > \S\ for all \S\. ■ 
Since the budgets [(1 — £)P> a \ are integral, finding a complete matching with budgets [(1 — £)B a \ is 
equivalent to finding a complete matching in a graph with J2 a eA L(l — e )-^aJ vertices on the A side. Lemmal32l 
now implies the existence of a complete matching in the set of edges that the fractional matching is supported 
on. This completes the proof of Theorem H] 

6 Acknowledgements 

The author is grateful to Nikhil Devanur for bringing the Gap -Existence problem to his attention. 



25 



References 

[1] K. Ahn and S. Guha. Linear programming in the semi-streaming model with application to the maximum 
matching problem. ICALP, pages 526-538, 2011. 

[2] K. Ahn and S. Guha. Linear programming in the semi-streaming model with application to the maximum 
matching problem. CoRR, abs/1 104.2315, 2011. 

[3] Denis Xavier Charles, Max Chickering, Nikhil R. Devanur, Kamal Jain, and Manan Sanghi. Fast al- 
gorithms for finding matchings in lopsided bipartite graphs with applications to display ads. ACM 
Conference on Electronic Commerce, pages 121-128, 2010. 

[4] Sebastian Eggert, Lasse Kliemann, and Anand Srivastav. Bipartite graph matchings in the semi- 
streaming model. ESA 2009, pages 492-503, 2009. 

[5] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. Graph dis- 
tances in the streaming model: the value of space. SODA, pages 745-754, 2005. 

[6] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph 
problems in a semi-streaming model. Theor. Comput. ScL, 348:207-216, 2005. 

[7] E. Fischer, E. Lehman, I. Newman, S. Raskhodnikova, R. Rubinfeld, and A. Samorodnitsky. Mono- 
tonicity testing over general poset domains. STOC, 2002. 

[8] A. Goel, M. Kapralov, and S. Khanna. On the communi- 

cation and streaming complexity of maximum bipartite matching. 

\http : //www. Stanford . edu/ -kapralov/ 'papers /matching- covers- full .pdf| 
(preliminary version appeared in SODA 2012). 

[9] Chinmay Karande, Aranyak Mehta, and Pushkar Tripathi. Online bipartite matching with unknown 
distributions. STOC, pages 587-596, 2011. 

[10] R. Karp, U. Vazirani, and V. Vazirani. An optimal algorithm for online bipartite matching. STOC, 1990. 

[11] Christian Konrad, Frederic Magniez, and Claire Mathieu. Maximum matching in semi-streaming with 
few passes. CoRR, abs/1 112.0184, 2011. 

[12] V. I. Levenstein. Upper bounds for codes with a fixed weight of vectors (in russian). Problems of 
information transmission, pages 3-12, 1971. 

[13] Mohammad Mahdian and Qiqi Yan. Online bipartite matching with random arrivals: an approach based 
on strongly factor-revealing lps. STOC, pages 597-606, 2011. 

[14] A. McGregor. Finding graph matchings in data streams. APPROX-RANDOM, pages 170-181, 2005. 

[15] Rajeev Motwani, Rina Panigrahy, and Ying Xu. Fractional matching via balls-and-bins. APPROX- 
RANDOM, pages 487-198, 2006. 



26 



